この ライブラリ最優先のエンジニアリング原則 手動でのカーネル開発からシステムアーキテクチャアプローチへのパラダイムシフトを表しています。ROCmエコシステムでは、この哲学は、エンジニアリングリソースがアプリケーションレベルのロジックに集中し、デバイス固有のチューニングは専門的なAMDライブラリに委ねるべきであると規定しています。
1. 哲学的転換
熟練したGPUエンジニアは次のように尋ねません: 「このカーネルを書けるか?」 むしろ次のように尋ねます: 「このカーネルを書くべきか?」 カスタムカーネルはしばしば技術的負債になります。rocBLASや rocBLAS または rocFFT は、単一の開発者がほとんど達成できない、何千時間にも及ぶアセンブリレベルのチューニングを象徴しています。
2. ライブラリの積極的活用
積極的に ライブラリを使用することを選択することでアプリケーションが「無料」のパフォーマンス向上を享受することを確実にします。AMDが新しいアーキテクチャ(例:CDNA 3)をリリースすると、ライブラリの更新により、ホストコードの一行も変更せずに即座に最適化が行われます。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What is the primary mandate of the Library-First Engineering Principle?
To write custom HIP kernels for every operation to ensure maximum control.
To default to existing ROCm libraries before attempting custom HIP implementations.
To prioritize CPU execution over GPU acceleration.
To minimize the use of AMD-native headers.
✅ Correct!
Defaulting to libraries ensures you benefit from vendor-tuned performance and reduces technical debt.❌ Incorrect
Writing custom kernels by default is considered inefficient in a 'Library-First' philosophy.QUESTION 2
According to the lesson, how should custom kernels be treated in a production environment?
As the primary mode of operation.
As technical debt that must be justified by extreme edge cases.
As assets that increase the value of the codebase significantly.
As temporary placeholders for library functions.
✅ Correct!
Custom kernels require manual maintenance for every new GPU generation, whereas libraries handle this abstraction for you.❌ Incorrect
The principle views custom code as a maintenance burden unless it provides a unique competitive advantage.QUESTION 3
What is a major benefit of using ROCm libraries when transitioning between GPU architectures (e.g., CDNA 2 to CDNA 3)?
The developer must rewrite the kernel in assembly.
The developer receives 'free' performance gains via library updates.
The developer must manually adjust thread block sizes.
Libraries prevent the use of newer hardware features.
✅ Correct!
AMD tunes the libraries for specific silicon; updating the library package often boosts performance without source code changes.❌ Incorrect
One of the greatest strengths of libraries is hardware abstraction.QUESTION 4
Which question characterizes the maturity of a GPU engineer?
"How can I maximize my line count?"
"Can I write this kernel?"
"Should I write this kernel?"
"Is there a way to avoid using handles?"
✅ Correct!
A mature engineer prioritizes efficiency, maintenance, and performance over the pride of writing custom code.❌ Incorrect
Just because you 'can' write something doesn't mean it is the best use of project resources.QUESTION 5
Which ROCm library would a 'Library-First' team use to replace a 3D Stencil kernel if possible?
rocSPARSE or rocFFT
hipInfo
ROCm-SMI
rocAL
✅ Correct!
Many stencil operations can be mapped to frequency domain transforms or sparse matrix operations already optimized in these libraries.❌ Incorrect
SMI is for management; hipInfo doesn't exist; rocAL is for augmentation. rocSPARSE and rocFFT are the compute engines.Architectural Migration Challenge
Applying Library-First Principles to Legacy Systems
You are tasked with migrating a seismic imaging application that contains multiple custom-written HIP kernels for Fourier transforms and vector additions. The code currently requires manual tuning every time the hardware is upgraded from Radeon Pro to Instinct GPUs.
Q
Identify the primary step in the migration workflow regarding kernel and host code separation.
Solution:
The developer should split the kernel and host code into separate source files. This modularity allows for the incremental replacement of custom `__global__` functions with calls to optimized libraries like rocFFT or rocBLAS without disrupting the high-level application flow or memory management logic.
The developer should split the kernel and host code into separate source files. This modularity allows for the incremental replacement of custom `__global__` functions with calls to optimized libraries like rocFFT or rocBLAS without disrupting the high-level application flow or memory management logic.
Q
Why would a 'Library-First' approach be faster to implement for a team of developers?
Solution:
By mapping operations to libraries, the team achieves 95%+ of theoretical peak performance immediately. They avoid the weeks or months typically spent on micro-architectural tuning (tiling, occupancy, shared memory bank conflicts) which are already solved within the pre-built ROCm library binaries.
By mapping operations to libraries, the team achieves 95%+ of theoretical peak performance immediately. They avoid the weeks or months typically spent on micro-architectural tuning (tiling, occupancy, shared memory bank conflicts) which are already solved within the pre-built ROCm library binaries.